Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Differentially private synthetic data provide a powerful mechanism to enable data analysis while protecting sensitive information about individuals. However, when the data lie in a high-dimensional space, the accuracy of the synthetic data suffers from the curse of dimensionality. In this paper, we propose a differentially private algorithm to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance. A key step of our algorithm is a private principal component analysis (PCA) procedure with a near-optimal accuracy bound that circumvents the curse of dimensionality. Unlike the standard perturbation analysis, our analysis of private PCA works without assuming the spectral gap for the covariance matrix.more » « lessFree, publicly-accessible full text available January 15, 2026
- 
            We develop a unified approach to bounding the largest and smallest singular values of an inhomogeneous random rectangular matrix, based on the non-backtracking operator and the Ihara-Bass formula for general random Hermitian matrices with a bipartite block structure. We obtain probabilistic upper (respectively, lower) bounds for the largest (respectively, smallest) singular values of a large rectangular random matrix X. These bounds are given in terms of the maximal and minimal 2-norms of the rows and columns of the variance profile of X. The proofs involve finding probabilistic upper bounds on the spectral radius of an associated non-backtracking matrix B. The two-sided bounds can be applied to the centered adjacency matrix of sparse inhomogeneous Erd˝os-Rényi bipartite graphs for a wide range of sparsity, down to criticality. In particular, for Erd˝os-Rényi bipartite graphs G(n,m, p) with p = ω(log n)/n, and m/n→ y ∈ (0,1), our sharp bounds imply that there are no outliers outside the support of the Marˇcenko-Pastur law almost surely. This result extends the Bai-Yin theorem to sparse rectangular random matrices.more » « lessFree, publicly-accessible full text available November 1, 2025
- 
            We compute the eigenvalue fluctuations of uniformly distributed random biregular bipartite graphs with fixed and growing degrees for a large class of analytic functions. As a key step in the proof, we obtain a total variation distance bound for the Poisson approximation of the number of cycles and cyclically non-backtracking walks in random biregular bipartite graphs, which might be of independent interest. We also prove a semicircle law for random [Formula: see text]-biregular bipartite graphs when [Formula: see text]. As an application, we translate the results to adjacency matrices of uniformly distributed random regular hypergraphs.more » « less
- 
            Ruiz, Francisco; Dy, Jennifer; van_de_Meent, Jan-Willem (Ed.)We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available